In [1]:

    
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np



In [2]:

    
import seaborn as sns
import matplotlib.pyplot as plt
sns.set(style="whitegrid")

# Load the example dataset of brain network correlations
df = sns.load_dataset("brain_networks", header=[0, 1, 2], index_col=0)

# Pull out a specific subset of networks
used_networks = [1, 3, 4, 5, 6, 7, 8, 11, 12, 13, 16, 17]
used_columns = (df.columns.get_level_values("network")
                          .astype(int)
                          .isin(used_networks))
df = df.loc[:, used_columns]

# Compute the correlation matrix and average over networks
corr_df = df.corr().groupby(level="network").mean()
corr_df.index = corr_df.index.astype(int)
corr_df = corr_df.sort_index().T

# Set up the matplotlib figure
f, ax = plt.subplots(figsize=(11, 6))

# Draw a violinplot with a narrower bandwidth than the default
sns.violinplot(data=corr_df, palette="Set3", bw=.2, cut=1, linewidth=1)

# Finalize the figure
ax.set(ylim=(-.7, 1.05))
sns.despine(left=True, bottom=True)









    



/home/ubuntu/miniconda2/lib/python2.7/site-packages/IPython/html.py:14: ShimWarning: The `IPython.html` package has been deprecated. You should import from `notebook` instead. `IPython.html.widgets` has moved to `ipywidgets`.
  "`IPython.html.widgets` has moved to `ipywidgets`.", ShimWarning)

Scope

Representation - 2D, 3D plots
Provide idioms -- "Business graphics" line, bar, scatter plots -- "Statistics plots" whisker -- "Higher dimensioned data" - heatmap
In-class problem solving for a task



In [3]:

    
def mm(s_conc, vmax, km):
    """
    :param np.array s_conc: substrate concentrations
    :param float vmax: maximum reaction rate
    :param float km: half substrate concentration
    :return np.array: reaction rates
    """
    result = vmax*s_conc/(s_conc+km)
    return result



In [4]:

    
s_conc = np.array([m+0.1 for m in range(100)])
plt.plot(s_conc,mm(s_conc, 4, .4), 'b.-')









    Out[4]:





[<matplotlib.lines.Line2D at 0x7fce4c61f450>]



In [5]:

    
s_conc = np.array([m+0.1 for m in range(100)])
params = [(5,0.5), (5, 20), (10, 0.5), (15, 20)]
ymax = max([x for (x,y) in params])
nplots = len(params)
fig = plt.figure()                                                                                                                                                        
yticks = np.arange(0, ymax)                                                                                             

cur = 0
for (vmax, km) in params:
    cur += 1
    ax = fig.add_subplot(nplots, 1, cur)
    ax.axis([0, len(s_conc), 0, ymax])
    ax.set_yticks([0, ymax])
    ax.set_yticklabels([0, ymax])
    plt.plot(s_conc, mm(s_conc, vmax, km), 'b.-')

plt.show()



In [6]:

    
# Parameter plot
km = [y for (x,y) in params]
vmax = [x for (x,y) in params]
plt.axis([0, max(km)+2, 0, max(vmax)+ 2])
plt.xlabel('K_M')
plt.ylabel('V_MAX')
plt.plot(km, vmax, 'bo ')









    Out[6]:





[<matplotlib.lines.Line2D at 0x7fce4c698050>]

Visualization in Python

There are many python packages for visualization. We'll start with the most popular package, matplotlib. And, we'll use the trip data.



In [7]:

    
import pandas as pd
import matplotlib.pyplot as plt
# The following ensures that the plots are in the notebook
%inline matplotlib
# We'll also use capabilities in numpy
import numpy as np
df = pd.read_csv("2015_trip_data.csv")



In [8]:

    
df.head()









    Out[8]:






  
    
      
      trip_id
      starttime
      stoptime
      bikeid
      tripduration
      from_station_name
      to_station_name
      from_station_id
      to_station_id
      usertype
      gender
      birthyear
    
  
  
    
      0
      431
      10/13/2014 10:31
      10/13/2014 10:48
      SEA00298
      985.935
      2nd Ave & Spring St
      Occidental Park / Occidental Ave S & S Washing...
      CBD-06
      PS-04
      Annual Member
      Male
      1960
    
    
      1
      432
      10/13/2014 10:32
      10/13/2014 10:48
      SEA00195
      926.375
      2nd Ave & Spring St
      Occidental Park / Occidental Ave S & S Washing...
      CBD-06
      PS-04
      Annual Member
      Male
      1970
    
    
      2
      433
      10/13/2014 10:33
      10/13/2014 10:48
      SEA00486
      883.831
      2nd Ave & Spring St
      Occidental Park / Occidental Ave S & S Washing...
      CBD-06
      PS-04
      Annual Member
      Female
      1988
    
    
      3
      434
      10/13/2014 10:34
      10/13/2014 10:48
      SEA00333
      865.937
      2nd Ave & Spring St
      Occidental Park / Occidental Ave S & S Washing...
      CBD-06
      PS-04
      Annual Member
      Female
      1977
    
    
      4
      435
      10/13/2014 10:34
      10/13/2014 10:49
      SEA00202
      923.923
      2nd Ave & Spring St
      Occidental Park / Occidental Ave S & S Washing...
      CBD-06
      PS-04
      Annual Member
      Male
      1971

Now let's consider the popularity of the stations.



In [9]:

    
from_counts = pd.value_counts(df.from_station_id)
to_counts = pd.value_counts(df.to_station_id)

Our initial task is comparison - which stations are most popular. A bar plot seems appropriate.



In [10]:

    
from_counts.plot.bar()









    Out[10]:





<matplotlib.axes._subplots.AxesSubplot at 0x7fce4a90a090>

Now let's plot the to counts



In [11]:

    
to_counts.plot.bar()









    Out[11]:





<matplotlib.axes._subplots.AxesSubplot at 0x7fce4a1d3850>

We want if there is a general movement of bikes from one station to another. That is, are from and to counts out of balance. This is a comparison task. One approach is to combine the two bar plots in the same figure.



In [12]:

    
plt.subplot(3,1,1)
from_counts.plot.bar()
plt.subplot(3,1,3)
to_counts.plot.bar()
# Note the use of an empty second plot to provide space between the plots









    Out[12]:





<matplotlib.axes._subplots.AxesSubplot at 0x7fce48d5d390>

But this is deceptive since the two plots have different x-axis.



In [13]:

    
count_list = [to_counts[x] for x in from_counts.index]
ordered_to_counts = pd.Series(count_list, index=from_counts.index)
plt.subplot(3,1,1)
from_counts.plot.bar()
plt.subplot(3,1,3)
ordered_to_counts.plot.bar()









    Out[13]:





<matplotlib.axes._subplots.AxesSubplot at 0x7fce48728f90>

But this is awkward since it's difficult to find a specific station. Prefer to sort.

We'd like to compare this with the two stations. So, we need to order the x-axis.



In [14]:

    
df_counts = pd.DataFrame({'from': from_counts.values, 'to': ordered_to_counts.values}, index=from_counts.index)
df_counts.head()



In [15]:

    
df_counts.sort_index(inplace=True)  # Modifies the calling dataframe
df_counts.head()

To find the imbalance, compare the difference between "from" and "to"



In [17]:

    
df_outflow = pd.DataFrame({'outflow':df_counts.to - df_counts['from']}, index=df_counts.index)
df_outflow.plot.bar(legend=False)









    Out[17]:





<matplotlib.axes._subplots.AxesSubplot at 0x7fce481e0c90>

We can make this readable by only looking at stations with large outflows, either positive or negative.



In [20]:

    
min_outflow = 500
sel = abs(df_outflow.outflow) > min_outflow
df_outflow_small = df_outflow[sel]
df_outflow_small.plot.bar(legend=False)









    Out[20]:





<matplotlib.axes._subplots.AxesSubplot at 0x7fce43e97a10>

In-class exercise

Using the pronto data, explore comparisons & trends in # rides by TOD, DOW, Station, Month, membership, gender
What idioms work best for the different kinds of trends

Work in teams of three

One person in the team will give a 2 minute summary at end of class

	from	to
WF-01	6742	7212
BT-01	5885	5800
CBD-13	5385	7189
CH-07	5190	1657
SLU-15	5006	5328

	from	to
BT-01	5885	5800
BT-03	4199	3386
BT-04	2221	1856
BT-05	3368	3459
CBD-03	2974	3959

	trip_id	starttime	stoptime	bikeid	tripduration	from_station_name	to_station_name	from_station_id	to_station_id	usertype	gender	birthyear
0	431	10/13/2014 10:31	10/13/2014 10:48	SEA00298	985.935	2nd Ave & Spring St	Occidental Park / Occidental Ave S & S Washing...	CBD-06	PS-04	Annual Member	Male	1960
1	432	10/13/2014 10:32	10/13/2014 10:48	SEA00195	926.375	2nd Ave & Spring St	Occidental Park / Occidental Ave S & S Washing...	CBD-06	PS-04	Annual Member	Male	1970
2	433	10/13/2014 10:33	10/13/2014 10:48	SEA00486	883.831	2nd Ave & Spring St	Occidental Park / Occidental Ave S & S Washing...	CBD-06	PS-04	Annual Member	Female	1988
3	434	10/13/2014 10:34	10/13/2014 10:48	SEA00333	865.937	2nd Ave & Spring St	Occidental Park / Occidental Ave S & S Washing...	CBD-06	PS-04	Annual Member	Female	1977
4	435	10/13/2014 10:34	10/13/2014 10:49	SEA00202	923.923	2nd Ave & Spring St	Occidental Park / Occidental Ave S & S Washing...	CBD-06	PS-04	Annual Member	Male	1971